Capstone Project: Pontential Ice Cream Shop Locations in the Greater Rochester Area

In [1]:
# MODULES
import numpy as np
import folium
from folium import FeatureGroup, LayerControl, Map, Marker, CircleMarker
import requests 
from geopy.geocoders import Nominatim
from pyproj import CRS
import pyproj
import json
import os
import pandas as pd
from IPython.display import display, HTML
from sklearn.cluster import DBSCAN
%matplotlib inline
import matplotlib.pyplot as plt
np.set_printoptions(precision=3,suppress=True)

Introduction: Background and Motivation

Back to Table of Contents

The goal is to find a promising location to promote and sell new artisan ice cream. The constraint is that this location should be within the greater Rochester area in upstate New York to keep the work commute for the owners within reasonable bounds. The target of this project are the owners or any other locals interested in the ice cream business.

The challenge is that there are already a number of well-established ice cream parlors. Even though we expect that the new owners would not shy away from competition by just setting up business in popular locaions, it may be prudent to examine how many ice cream shops some particular locations can support, what makes them special, and, if there are locations with similar characteristics that are not discovered yet by the ice cream business.

This project aims to answer some of these questions to give the new ice cream shop owners the best shot of establishing a successful business.

In [2]:
# Paraemters
Load_from_files = True  # if set True data is loaded from files from earlier sessions rather ... 
                        # ... than calling functions/methods/APIs (e.g., foursquares) generting them

Locations of Interest

There are two locations on which the analysis is based on:

  1. Rochester city center that is in the center of the target area (Monroe County)
  2. The village of Fairport. It is considered close to the edge of the area of interest to define the search radius. Fairport also serves as an example location of successful ice crem shops. See below in Analysis

Their coordinates obtained from 'geopy' and converted through 'pyproj' into a distance are:

In [3]:
if Load_from_files:
    PointsOfInt = np.load('DataFiles\\PointsOfInt.npy').item()
    for i in PointsOfInt.keys():
        print(i,': ',PointsOfInt[i]['point'])
else:
    PointsOfInt = {'Rochester': {'address':'Rochester, NY', 'point': None}, 
                   'Fairport': {'address':'Fairport, NY','point': None}}

    geolocator = Nominatim(user_agent="Monroe_explorer")
    for PoI in PointsOfInt.items():
        location = geolocator.geocode(PoI[1]['address'])
        PoI[1]['point']= location.point[:2]
        print(PoI[0],': ',PoI[1]['point'], sep='')

geod_wgs84 = CRS("epsg:4326").get_geod()

az12, az21, dist = geod_wgs84.inv(PointsOfInt['Rochester']['point'][1] , PointsOfInt['Rochester']['point'][0],
                                  PointsOfInt['Fairport']['point'][1], PointsOfInt['Fairport']['point'][0])
print(f'Distance between them:{dist*0.62137/1000:4.1f} mi')
Rochester :  (43.157285, -77.615214)
Fairport :  (43.0993, -77.443014)
Distance between them: 9.6 mi

An extensive search on data such as statistics on towns and neighborhoods in databases on Monroe county (or greater Rochester ara) was conducted leading to sites such as NYU Spatial Data Repository, Monroe County GIS, NYS GIS Clearing house, etc. Instead of collecting sparse data, converting the different formats, parsing data from individual web pages, etc., it was found that retrieving data directly from different venues from Foursquare and then processing it is more effective.

For the queries the more general term 'explore' was used rather than 'search' to be more inclusive. Multiple queries are conducted with terms that may serve as surrogates to ‘ice cream’.

The relevant data was than cast into Pandas dataframes. The 'Categories' colum contains all categories from the JSON files generated by the queries to catch as much as possible. They were concatenated into a string to be used in later filtering.

In [4]:
# FourSquare ID
CLIENT_ID = 'xxxxxxxxx' # Foursquare ID
CLIENT_SECRET = 'xxxxx' # Foursquare Secret
VERSION = '20180604'
LIMIT = 300
In [5]:
# Query Parameters
Location = 'Rochester'
queries = ['Ice Cream','Coffee','Latte', 'Cafe','Beer', 'Breweries', 'Pubs']
radius = dist+1000
In [6]:
display(HTML(f"<h3>Query Scores:</h3>"))
QueryResults = []
if Load_from_files:
    if os.path.exists('JSONfiles'):
        for query in queries:
            try:
                with open('JSONfiles\\'+query+'.json') as json_file:  
                    results = json.load(json_file)
            except: print('File for \'{query}\' missing') 
            QueryResults.append(results)    
            print(f'Query {query} produced {len(results["response"]["groups"][0]["items"])} hits.' )
    else: Print('json file folder missing')  
else:
    for query in queries:
        url = ('https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'\
               .format(CLIENT_ID, CLIENT_SECRET, PointsOfInt[Location]['point'][0], PointsOfInt[Location]['point'][1], 
                       VERSION, query, radius, LIMIT))
        results = requests.get(url).json()
        with open('JSONfiles\\'+query+'.json', 'w') as outfile:  
            json.dump(results, outfile)
        QueryResults.append(results)
        print(f'Query {query} produced {len(results["response"]["groups"][0]["items"])} hits.' )

Query Scores:

Query Ice Cream produced 95 hits.
Query Coffee produced 100 hits.
Query Latte produced 41 hits.
Query Cafe produced 100 hits.
Query Beer produced 100 hits.
Query Breweries produced 31 hits.
Query Pubs produced 59 hits.
In [7]:
# Build dataframes
df_queries = dict.fromkeys(queries)
for i,query in enumerate(queries):
    df_queries[query] = pd.DataFrame(columns=['Name','Address','City', 'Lat', 'Lng', 'Categories'])
    for j, item in enumerate(QueryResults[i]['response']['groups'][0]['items'][:]):
        df_queries[query].loc[j,'Name'] = item['venue']['name']
        Location = item['venue']['location']
        if 'address' in Location.keys(): df_queries[query].loc[j,'Address'] = Location['address']
        if 'city' in Location.keys(): df_queries[query].loc[j,'City'] = Location['city']    
        df_queries[query].loc[j,'Lat'] = Location['lat']
        df_queries[query].loc[j,'Lng'] = Location['lng']
        cat = item['venue']['categories'][0]
        df_queries[query].loc[j,'Categories'] = ', '.join([cat[key] for key in ['name', 'pluralName', 'shortName']])
In [8]:
display(HTML(f"<h3>First few lines of dataframes from the queries:</h3>"))
for query in queries:
    display(HTML(f"<h4>Query '{query}':</h4>"))
    display(HTML(df_queries[query].head(3).to_html()))

First few lines of dataframes from the queries:

Query 'Ice Cream':

Name Address City Lat Lng Categories
0 Hedonist Artisan Ice Cream 672 South Ave Rochester 43.1424 -77.6045 Ice Cream Shop, Ice Cream Shops, Ice Cream
1 Pittsford Farms Dairy 44 N Main St Pittsford 43.0942 -77.5131 Ice Cream Shop, Ice Cream Shops, Ice Cream
2 Netsins Ice Cream 290 Culver Pkwy Rochester 43.1718 -77.5517 Ice Cream Shop, Ice Cream Shops, Ice Cream

Query 'Coffee':

Name Address City Lat Lng Categories
0 Java's Cafe 16 Gibbs St Rochester 43.1574 -77.6014 Coffee Shop, Coffee Shops, Coffee Shop
1 Rochester Public Market 280 Union St N Rochester 43.1652 -77.5892 Farmers Market, Farmers Markets, Farmer's Market
2 Fifth Frame Brewing Company 155 Saint Paul St Rochester 43.1599 -77.6102 Coffee Shop, Coffee Shops, Coffee Shop

Query 'Latte':

Name Address City Lat Lng Categories
0 Equal Grounds Coffeeshop & Books 750 South Ave Rochester 43.1407 -77.605 Café, Cafés, Café
1 Spot Coffee 200 East Ave Rochester 43.1564 -77.5991 Coffee Shop, Coffee Shops, Coffee Shop
2 Java's Cafe 16 Gibbs St Rochester 43.1574 -77.6014 Coffee Shop, Coffee Shops, Coffee Shop

Query 'Cafe':

Name Address City Lat Lng Categories
0 Jembetat African Gallery & Cafe 645 Park Ave Rochester 43.1483 -77.5785 Café, Cafés, Café
1 Java's Cafe 16 Gibbs St Rochester 43.1574 -77.6014 Café, Cafés, Café
2 Equal Grounds Coffeeshop & Books 750 South Ave Rochester 43.1407 -77.605 Café, Cafés, Café

Query 'Beer':

Name Address City Lat Lng Categories
0 Tap & Mallet 381 Gregory St Rochester 43.1428 -77.6017 Beer Garden, Beer Gardens, Beer Garden
1 Roc Brewing Co., LLC 56 S Union St Rochester 43.1534 -77.5977 Beer Garden, Beer Gardens, Beer Garden
2 Acme Bar & Pizza 495 Monroe Ave Rochester 43.1466 -77.5938 Bar, Bars, Bar

Query 'Breweries':

Name Address City Lat Lng Categories
0 Roc Brewing Co., LLC 56 S Union St Rochester 43.1534 -77.5977 Brewery, Breweries, Brewery
1 Genesee Brewery 445 Saint Paul St Rochester 43.1635 -77.6137 Brewery, Breweries, Brewery
2 Swiftwater Brewing 378 Mount Hope Ave Rochester 43.1425 -77.612 Brewery, Breweries, Brewery

Query 'Pubs':

Name Address City Lat Lng Categories
0 The Old Toad 277 Alexander St Rochester 43.1537 -77.5954 Pub, Pubs, Pub
1 The Genesee Brew House 25 Cataract St Rochester 43.1636 -77.6147 Pub, Pubs, Pub
2 Tap & Mallet 381 Gregory St Rochester 43.1428 -77.6017 Pub, Pubs, Pub

Methodology

Back to Table of Contents

This section combines the dataframes, examines the unique terms, and then dwindles them down to filter keys. The filters produced two general venues or groups that may serve as surrogates and one that is associated with ice cream itself. Cluster analysis and possible correlations between them are examined to explore potential new locations for ice cream parlors.

The descriptive headings/labels for the groups are:

  1. Ice Cream
  2. Coffee
  3. Beer

Please go to analysis to learn in more detail how this methodology is applied.

Ice Cream Parlors: Filters to narrow down loacations that are in some way associated with 'Ice Cream'

'Ice Cream' is here very loosely used. The idea is to look for places that could compete with ice cream palors.

In [9]:
display(HTML(f"<h4>Unique Categories for 'Ice Cream':</h4>"))
display(pd.Series(df_queries['Ice Cream']['Categories'].unique()))

Unique Categories for 'Ice Cream':

0            Ice Cream Shop, Ice Cream Shops, Ice Cream
1                                     Farm, Farms, Farm
2                  Burger Joint, Burger Joints, Burgers
3       Frozen Yogurt Shop, Frozen Yogurt Shops, Yogurt
4         Deli / Bodega, Delis / Bodegas, Deli / Bodega
5                      Gastropub, Gastropubs, Gastropub
6                                        Bar, Bars, Bar
7                                        Pub, Pubs, Pub
8     American Restaurant, American Restaurants, Ame...
9     Fast Food Restaurant, Fast Food Restaurants, F...
10           Breakfast Spot, Breakfast Spots, Breakfast
11                        Creperie, Creperies, Creperie
12               Coffee Shop, Coffee Shops, Coffee Shop
13                Dessert Shop, Dessert Shops, Desserts
14                     Pizza Place, Pizza Places, Pizza
dtype: object
In [10]:
filter_IceCream = ['Ice Cream Shop', 'Ice Cream Shops', 'Ice Cream', 
                    'Frozen Yogurt Shop', 'Frozen Yogurt Shops', 'Yogurt',
                    'Dessert Shop', 'Dessert Shops', 'Desserts']
display(HTML(f"<h4>Filter Set:</h4>"))
for item in filter_IceCream[:-1]: print(item, end=', ')
print(filter_IceCream[-1])  
filter_IceCream = set(filter_IceCream)

Filter Set:

Ice Cream Shop, Ice Cream Shops, Ice Cream, Frozen Yogurt Shop, Frozen Yogurt Shops, Yogurt, Dessert Shop, Dessert Shops, Desserts
In [11]:
# FILTERING of 'Ice Cream'
# to filter the datafram I split the strings in 'Categories' column by the ', ' into a list and ...
# ... then convert the list into a set. I compare this set with the filter set. If the intersection ...
# ... of both sets is not empty, then I accept it.

idx = []
for i in range(len(df_queries['Ice Cream'])):
        idx.append(filter_IceCream & set(df_queries['Ice Cream'].loc[i,'Categories'].split(', ')) !=set())
df_IceCream = df_queries['Ice Cream'][idx]  
df_IceCream.reset_index(inplace=True)
df_IceCream = df_IceCream.drop('index',axis=1)


display(HTML(f"<Large>The filtered datafram 'Ice Cream' has now {len(df_IceCream)} entries that represents locations "+
      f'that serve ice cream or edibles closly related to ice cream</Large>'))
The filtered datafram 'Ice Cream' has now 75 entries that represents locations that serve ice cream or edibles closly related to ice cream
In [12]:
# Concatenate dataframes
df_Cafe = pd.concat([df_queries[query] for query in queries[1:4]])
df_Cafe.reset_index(inplace=True)
df_Cafe = df_Cafe.drop('index',axis=1)

Coffee Shopes or Cafés

The dataframes 'Coffee', 'Latte', and 'Café' are combined into one datafram and a similar methodology as for the 'Ice Cream' I applied.

In [13]:
display(HTML(f"<h4>Unique Categories for '\'Caf\u00e9\' ':</h4>"))
display(pd.Series(df_Cafe['Categories'].unique()))

Unique Categories for ''Café' ':

0                Coffee Shop, Coffee Shops, Coffee Shop
1      Farmers Market, Farmers Markets, Farmer's Market
2                              Bakery, Bakeries, Bakery
3                                     Café, Cafés, Café
4                           Library, Libraries, Library
5                         Wine Bar, Wine Bars, Wine Bar
6                         Tea Room, Tea Rooms, Tea Room
7           Sandwich Place, Sandwich Places, Sandwiches
8      Italian Restaurant, Italian Restaurants, Italian
9                       Donut Shop, Donut Shops, Donuts
10           Breakfast Spot, Breakfast Spots, Breakfast
11    American Restaurant, American Restaurants, Ame...
12                      Bagel Shop, Bagel Shops, Bagels
13                                 Diner, Diners, Diner
14               Gas Station, Gas Stations, Gas Station
15                     Bookstore, Bookstores, Bookstore
16               Supermarket, Supermarkets, Supermarket
17                        Creperie, Creperies, Creperie
18    Arts & Crafts Store, Arts & Crafts Stores, Art...
19                                       Bar, Bars, Bar
20                  Restaurant, Restaurants, Restaurant
21      Frozen Yogurt Shop, Frozen Yogurt Shops, Yogurt
dtype: object
In [14]:
filter_Cafe= ['Coffee Shop', 'Coffee Shops', 'Coffee Shop', 
              'Café', 'Cafés', 'Café',
              'Tea Room', 'Tea Rooms', 'Tea Room']
display(HTML(f"<h4>Filter Set:</h4>"))
for item in filter_Cafe[:-1]: print(item, end=', ')
print(filter_Cafe[-1])  
filter_Cafe = set(filter_Cafe)

Filter Set:

Coffee Shop, Coffee Shops, Coffee Shop, Café, Cafés, Café, Tea Room, Tea Rooms, Tea Room
In [15]:
# FILTERING of 'Coffee places'

idx = []
N_prev = len(df_Cafe)
for i in range(len(df_Cafe)):
        idx.append(filter_Cafe& set(df_Cafe.loc[i,'Categories'].split(', ')) !=set())
df_Cafe = df_Cafe[idx]  
df_Cafe.reset_index(inplace=True)
df_Cafe = df_Cafe.drop('index',axis=1)

display(HTML(f"<Large>The filtered dataframe 'Caf\u00e9' has now {len(df_Cafe)} entries instead of {N_prev}.</Large>"))
The filtered dataframe 'Café' has now 190 entries instead of 241.

Beer and the likes . . .

Similar methodology as above ...

In [16]:
# Concatenate dataframes
df_Beer = pd.concat([df_queries[query] for query in queries[4:]])
df_Beer.reset_index(inplace=True)
df_Beer.drop('index',axis=1, inplace=True)
In [17]:
display(HTML(f"<h4>Unique Categories for 'Beer':</h4>"))
display(pd.Series(df_Beer['Categories'].unique()))

Unique Categories for 'Beer':

0                Beer Garden, Beer Gardens, Beer Garden
1                                        Bar, Bars, Bar
2                                        Pub, Pubs, Pub
3                           Brewery, Breweries, Brewery
4                            BBQ Joint, BBQ Joints, BBQ
5                Coffee Shop, Coffee Shops, Coffee Shop
6                   Beer Store, Beer Stores, Beer Store
7     American Restaurant, American Restaurants, Ame...
8                                     Café, Cafés, Café
9                         Beer Bar, Beer Bars, Beer Bar
10        German Restaurant, German Restaurants, German
11                  Hockey Arena, Hockey Arenas, Hockey
12                        Wine Bar, Wine Bars, Wine Bar
13                Cocktail Bar, Cocktail Bars, Cocktail
14    New American Restaurant, New American Restaura...
15                     Gastropub, Gastropubs, Gastropub
16                         Irish Pub, Irish Pubs, Irish
17                     Pizza Place, Pizza Places, Pizza
18                                 Motel, Motels, Motel
19                  Sports Bar, Sports Bars, Sports Bar
20     Italian Restaurant, Italian Restaurants, Italian
21                     Wings Joint, Wings Joints, Wings
22                                 Hotel, Hotels, Hotel
dtype: object
In [18]:
filter_Beer = ['Beer Garden', 'Beer Gardens', 'Beer Garden',
               'Bar', 'Bars', 'Bar',
               'Pub', 'Pubs', 'Pub',
               'Beer Bar', 'Beer Bars', 'Beer Bar',
               'Irish Pub', 'Irish Pubs', 'Irish']

display(HTML(f"<h4>Filter Set:</h4>"))
for item in filter_Beer[:-1]: print(item, end=', ')
print(filter_Beer[-1])  
filter_Beer = set(filter_Beer)

Filter Set:

Beer Garden, Beer Gardens, Beer Garden, Bar, Bars, Bar, Pub, Pubs, Pub, Beer Bar, Beer Bars, Beer Bar, Irish Pub, Irish Pubs, Irish
In [19]:
# FILTERING of 'Beer places'

idx = []
N_prev = len(df_Beer)
for i in range(len(df_Beer)):
        idx.append(filter_Beer& set(df_Beer.loc[i,'Categories'].split(', ')) !=set())
df_Beer = df_Beer[idx]  
df_Beer.reset_index(inplace=True)
df_Beer.drop('index',axis=1, inplace=True)
display(HTML(f"<Large>The filtered dataframe 'Beer' has now {len(df_Beer)} entries instead of {N_prev}.</Large>"))
The filtered dataframe 'Beer' has now 132 entries instead of 190.

Locations of the filtered Sites:

In [20]:
map_Monroe = folium.Map(location = (PointsOfInt['Rochester']['point'][0],
                                    PointsOfInt['Rochester']['point'][1]), zoom_start=11)
BeerL= FeatureGroup(name='Beer')
CafeL = FeatureGroup(name='Cafe')
IceCreamL = FeatureGroup(name='Ice Cream')

for i in range(len(df_Beer)):
    label = folium.Popup(df_Beer.loc[i,'Name'])    
    folium.CircleMarker([df_Beer.loc[i,'Lat'],df_Beer.loc[i,'Lng']],
                        radius = 5, 
                        popup = label, 
                        color = 'red',
                        fill_color = 'orange',
                        fill_opacity = 0.9).add_to(BeerL)
for i in range(len(df_Cafe)):
    label = folium.Popup(df_Cafe.loc[i,'Name'])    
    folium.CircleMarker([df_Cafe.loc[i,'Lat'],df_Cafe.loc[i,'Lng']],
                        radius = 5, 
                        popup = label, 
                        color = 'black',
                        fill_color = 'gray',
                        fill_opacity = 0.9).add_to(CafeL)
for i in range(len(df_IceCream)):
    label = folium.Popup(df_IceCream.loc[i,'Name'])
    folium.CircleMarker([df_IceCream.loc[i,'Lat'],df_IceCream.loc[i,'Lng']],
                        radius = 5, 
                        popup = label, 
                        color = 'blue',
                        fill_color = 'cyan',
                        fill_opacity = 0.9).add_to(IceCreamL)
map_Monroe.add_child(BeerL)
map_Monroe.add_child(CafeL)
map_Monroe.add_child(IceCreamL)
map_Monroe.add_child(folium.map.LayerControl())

legend_html = """
     <div style=”position: fixed; 
     bottom: 50px; left: 50px; width: 100px; height: 90px; 
     border:2px solid grey; z-index:9999; font-size:14px;> 
     <font size="4" style="color:blue">Ice Cream, </font>
     <font size="3"style="color:black"> Coffee, </font>
     <font size="3"style="color:Red">Beer. </font>
      </div>
    """
map_Monroe.get_root().html.add_child(folium.Element(legend_html))
map_Monroe.save('MapFiles\\map_1.html')
#map_Monroe

Analysis

Back to Table of Contents

Preliminary Assessment

By visiting the village of Fairport (town of Perinton) it was originally observed that it has multiple ice cream parlors, coffee or latte shops, breweries and pubs in close proximity. This led to the hypothesis that maybe the underlying economics of the town can support different venues that typically tourist or locals in their leisure time enjoy. Hence, one may argue, if one of the venues in another location or town is missing, e.g., ice cream parlors, maybe there is enough demand to support new business of that missing venue.

In essence all the searches tied to the two groups of terms:

  1. Coffee Shop, Coffee Shops, Coffee Shop, Café, Cafés, Café, Tea Room, Tea Rooms, Tea Room
  2. Beer Garden, Beer Gardens, Beer Garden, Bar, Bars, Bar, Pub, Pubs, Pub, Beer Bar, Beer Bars, Beer Bar, Irish Pub, Irish Pubs, Irish

serve as a surrogate.

Below is a map to check how Foursquares did on Fairport:

In [21]:
map_Monroe = folium.Map(location = (PointsOfInt['Fairport']['point'][0],
                                    PointsOfInt['Fairport']['point'][1]), zoom_start=15)
BeerL= FeatureGroup(name='Beer')
CafeL = FeatureGroup(name='Cafe')
IceCreamL = FeatureGroup(name='Ice Cream')

for i in range(len(df_Beer)):
    label = folium.Popup(df_Beer.loc[i,'Name'])    
    folium.CircleMarker([df_Beer.loc[i,'Lat'],df_Beer.loc[i,'Lng']],
                        radius = 5, 
                        popup = label, 
                        color = 'red',
                        fill_color = 'orange',
                        fill_opacity = 0.9).add_to(BeerL)
for i in range(len(df_Cafe)):
    label = folium.Popup(df_Cafe.loc[i,'Name'])    
    folium.CircleMarker([df_Cafe.loc[i,'Lat'],df_Cafe.loc[i,'Lng']],
                        radius = 5, 
                        popup = label, 
                        color = 'black',
                        fill_color = 'gray',
                        fill_opacity = 0.9).add_to(CafeL)
for i in range(len(df_IceCream)):
    label = folium.Popup(df_IceCream.loc[i,'Name'])
    folium.CircleMarker([df_IceCream.loc[i,'Lat'],df_IceCream.loc[i,'Lng']],
                        radius = 5, 
                        popup = label, 
                        color = 'blue',
                        fill_color = 'cyan',
                        fill_opacity = 0.9).add_to(IceCreamL)
map_Monroe.add_child(BeerL)
map_Monroe.add_child(CafeL)
map_Monroe.add_child(IceCreamL)
map_Monroe.add_child(folium.map.LayerControl())

legend_html = """
     <div style=”position: fixed; 
     bottom: 50px; left: 50px; width: 100px; height: 90px; 
     border:2px solid grey; z-index:9999; font-size:14px;> 
     <font size="4" style="color:blue">Ice Cream, </font>
     <font size="3"style="color:black"> Coffee, </font>
     <font size="3"style="color:Red">Beer</font>
     <font size="3"style="color:black"> in the village of Fairport. </font>
      </div>
    """
map_Monroe.get_root().html.add_child(folium.Element(legend_html))
map_Monroe.save('MapFiles\\map_2.html')
#map_Monroe

As it turns out Foursquare missed a few sites:

  1. One ice cream parlor
  2. One coffee shop
  3. Some pubs and one brewery

The interpretation is that these missed localities have not just caught on yet, e.g., the missed brewery is fairly new, or, that they are simply not that popular. Or, that Foursquare is not the optimal tool for this project. Looking for more suitable APIs, different avenues of approaching the poject, methodologies, etc., is certainly desirable, but unfortunately, due to resource constraints, out of scope.

However, we can malke the argument that if a site shows up on Foursquare it is likely to be popular. Hence, using Foursquare data should yield useful results; the downside being that we may miss some opportunities.

Hence, we procede with the analysis.

Examining Clusters

The purpose is to examine "higher densities" venues. For this Density-Based Spatial Clustering Applications with Noise, or DBSCAN, from scikit-learn.org is employed.

Essentially the sites are partitioned into core and noise points, the cores representing members of high-density areas. Admittingly, some arbitrary choices are needed. Here we choose:

  1. Maximum distance between two samples for one to be considered as in the neighborhood of the other (walking distance):
    • Ice Cream, Beer: 2000 feet
    • Café: 1000 feet
  2. The number of samples in a neighborhood for a point to be considered as a core point:
    • Ice Cream, Beer: 3
    • Café: 4

Before the cluster analysis can take place the geographic coordintes must be converted into local map or cartegraphic coordinates with distance units (feet) rather than angles. For this 'pyproj' was employed with the local reference EPSG:2261.

In [22]:
# Converting lat and longitude into x, y cartegraphic units (in feet)
# Corrrectness was confirmed by computing Fiarport-Rochester distance.

RocProj = pyproj.Proj("+init=EPSG:2261")     # central NY map ref
xr, yr = RocProj(PointsOfInt['Rochester']['point'][1],PointsOfInt['Rochester']['point'][0])   # ref coord.

# Ice Cream Places
X_ic = np.empty((len(df_IceCream),2))
for i, (lon, lat) in enumerate(zip(df_IceCream['Lng'],df_IceCream['Lat'])):
    X_ic[i,0], X_ic[i,1] = RocProj(lon,lat)  
X_ic -= [xr,yr]    

# Coffee Places
X_ca = np.empty((len(df_Cafe),2))
for i, (lon, lat) in enumerate(zip(df_Cafe['Lng'],df_Cafe['Lat'])):
    X_ca[i,0], X_ca[i,1] = RocProj(lon,lat)  
X_ca -= [xr,yr]    

# Beer Places
X_be = np.empty((len(df_Beer),2))
for i, (lon, lat) in enumerate(zip(df_Beer['Lng'],df_Beer['Lat'])):
    X_be[i,0], X_be[i,1] = RocProj(lon,lat)  
X_be -= [xr,yr]    
In [23]:
# Ice Cream
Clust_ic = DBSCAN(eps=2000, min_samples=3).fit(X_ic)

idx_ic = []
xy_ic_mean = []
lonlat_ic = []
for i in range(max(Clust_ic.labels_)+1):
    idx = np.where(Clust_ic.labels_ == i)[0]
    idx_ic.append(idx)
    xy_ic_mean.append(np.mean(X_ic[idx,:],axis=0))
    lonlat_ic.append(RocProj(xy_ic_mean[i][0]+xr,xy_ic_mean[i][1]+yr,inverse = True))
In [24]:
# Cafe
Clust_ca = DBSCAN(eps=1000, min_samples=4).fit(X_ca)

idx_ca = []
xy_ca_mean = []
lonlat_ca = []
for i in range(max(Clust_ca.labels_)+1):
    idx = np.where(Clust_ca.labels_ == i)[0]
    idx_ca.append(idx)
    xy_ca_mean.append(np.mean(X_ca[idx,:],axis=0))
    lonlat_ca.append(RocProj(xy_ca_mean[i][0],xy_ca_mean[i][1],inverse = True))
In [25]:
# Beer
Clust_be = DBSCAN(eps=2000, min_samples=3).fit(X_be)

idx_be = []
xy_be_mean = []
lonlat_be = []
for i in range(max(Clust_be.labels_)+1):
    idx = np.where(Clust_be.labels_ == i)[0]
    idx_be.append(idx)
    xy_be_mean.append(np.mean(X_be[idx,:],axis=0))
    lonlat_be.append(RocProj(xy_be_mean[i][0],xy_be_mean[i][1],inverse = True))
In [26]:
map_Monroe = folium.Map(location = (PointsOfInt['Rochester']['point'][0],
                                    PointsOfInt['Rochester']['point'][1]), zoom_start=11)
BeerL= FeatureGroup(name='Beer')
CafeL = FeatureGroup(name='Cafe')
IceCreamL = FeatureGroup(name='Ice Cream')

for i in range(len(idx_be)):
    for j in idx_be[i]:
        label = folium.Popup(df_Beer.loc[j,'Name'])
        folium.CircleMarker([df_Beer.loc[j,'Lat'],df_Beer.loc[j,'Lng']],
                            radius = 5, 
                            popup = label, 
                            color = 'red',
                            fill_color = 'orange',
                            fill_opacity = 0.9).add_to(BeerL)


for i in range(len(idx_ca)):
    for j in idx_ca[i]:
        label = folium.Popup(df_Cafe.loc[j,'Name'])
        folium.CircleMarker([df_Cafe.loc[j,'Lat'],df_Cafe.loc[j,'Lng']],
                            radius = 5, 
                            popup = label, 
                            color = 'black',
                            fill_color = 'gray',
                            fill_opacity = 0.9).add_to(CafeL)

for i in range(len(idx_ic)):
    for j in idx_ic[i]:
        label = folium.Popup(df_IceCream.loc[j,'Name'])
        folium.CircleMarker([df_IceCream.loc[j,'Lat'],df_IceCream.loc[j,'Lng']],
                            radius = 5, 
                            popup = label, 
                            color = 'blue',
                            fill_color = 'cyan',
                            fill_opacity = 0.9).add_to(IceCreamL)

map_Monroe.add_child(BeerL)
map_Monroe.add_child(CafeL)
map_Monroe.add_child(IceCreamL)
map_Monroe.add_child(folium.map.LayerControl())

legend_html = """
     <div style=”position: fixed; 
     bottom: 50px; left: 50px; width: 100px; height: 90px; 
     border:2px solid grey; z-index:9999; font-size:14px;>
     <font size="5" style="color:black">Cluster Analysis ("Noise" points left out): </font><br>
     <font size="4" style="color:blue">Ice Cream, </font>
     <font size="3"style="color:black"> Coffee, </font>
     <font size="3"style="color:Red">Beer. </font>
      </div>
    """
map_Monroe.get_root().html.add_child(folium.Element(legend_html))
map_Monroe.save('MapFiles\\map_3.html')
#map_Monroe

Results and Discussion

Back to Table of Contents

Referring to the cluster analysis, not surprisingly the largest clusters for "Beer" and "Coffee" are in the city of Rochester.

The remaing isolated clusters are small. Some exhibit in a very broad sense some co-locations (approx. locations):

  1. Culver Rd: 'Beer' and 'Ice Cream'
  2. Elmwood Av: 'Coffee' and 'Ice Cream'

Unfortunately there is not enough points to examine any (loose) correlations through, e.g., a distance matrix. This idea simply did not pan out. This, may, in part, as mention earlier, be due to lack of data.

Nevertheless examining the 'Ice Cream' clusters may yield some insight. The coordinates and associated addresses of the clusters are:

In [27]:
geolocator = Nominatim(user_agent="Monroe_explorer")
df_cluster_IceCream = pd.DataFrame(columns=['Dist. to Rochester [mi]', 'Long.', 'Lat.','Address', 
                                            'Town', 'Sites in Proximity'])
for i in range(len(lonlat_ic)):
    df_cluster_IceCream.loc[i,'Dist. to Rochester [mi]']= np.sqrt(xy_ic_mean[i][0]**2+xy_ic_mean[i][1]**2)/5280
    df_cluster_IceCream.loc[i,'Long.'] = lonlat_ic[i][0]
    df_cluster_IceCream.loc[i,'Lat.'] = lonlat_ic[i][1]
    loc = str(lonlat_ic[i][1])+','+str(lonlat_ic[i][0])
    addrRaw = geolocator.reverse(loc, addressdetails=True)
    addr = addrRaw.address.split(',')
    df_cluster_IceCream.loc[i,'Address'] = addr[-7]+', '+ addr[-3]+', '+ addr[-2]
    df_cluster_IceCream.loc[i,'Town'] = addr[-5]
    df_cluster_IceCream.loc[i,'Sites in Proximity'] = addr[0]
df_cluster_IceCream['Dist. to Rochester [mi]'] = df_cluster_IceCream['Dist. to Rochester [mi]'].astype(float)  
df_cluster_IceCream['Long.'] = df_cluster_IceCream['Long.'].astype(float)   
df_cluster_IceCream['Lat.'] = df_cluster_IceCream['Lat.'].astype(float)   
In [28]:
display(HTML(f"<h3>Locations of the 'Ice Cream Clusters'</h3>"))
df_cluster_IceCream.round(2)

Locations of the 'Ice Cream Clusters'

Out[28]:
Dist. to Rochester [mi] Long. Lat. Address Town Sites in Proximity
0 8.78 -77.46 43.21 Van Ingen Drive, New York, 14580 Webster Town Webster Town Court
1 9.55 -77.44 43.10 North Main Street, New York, 14450 Perinton Town Riki's Family Restaurant
2 6.33 -77.49 43.13 Penfield Road, New York, 14526 Penfield Town Panorama Plaza
3 6.52 -77.54 43.23 Culver Road, New York, 14622 Irondequoit Town 4600
4 5.00 -77.63 43.09 Miracle Mile Drive, New York, 14623 Henrietta Town Panera Bread Company
5 2.25 -77.62 43.12 Elmwood Avenue, New York, 14642 Rochester Tim Hortons/Cams Pizza

Examining these locations following was noted:

  1. Two were close to water (1 and 3)
  2. Two were at shopping plazas (2 and 4)
  3. Two were close to parks (0 and 5)

This could make Piffsford Plaza a potential candidate. It has already a 'Coffee' cluster. Other possible candidates are Town of Pittsford and Ontario Beach; they are close to water. Note, they also have 'Beer' clusters.

Even though these are promissing candidates the underlying data is not strong enoug, i.e., a more detailed study with a wider span and local scouting is required, which is out of scope.

The purpose of this study was to find potential new locations for ice cream shops within the greater Rochester area. A good location is a location where demand is high but not yet crowded with competition that may stifle a budding business.

Different venues were explored with Foursquare. Results were filtered, combined and partitioned into groups. One of them representing ‘ice cream’, i.e., places with terms associated with ‘ice cream’. The other two groups examined venues that may serve as surrogates or that may correlate with preferred locations for ice cream shops. The groups were subjected to cluster analysis. Unfortunately there was not enough strong data to establish a good enough connections between the surrogate groups and the ‘ice cream’ group. In part, this was due to not enough available data.

Cluster analysis of the ice cream shops produce six clusters. Looking at them common themes were found such as proximity to water or being at or close to shopping plazas which would suggest higher chance of customer traffic. This prompted a few potential sites for consideration. Even though promising, the data is not strong enough and a more extensive study with a wider scope is recommended.